The TREC-6 Spoken Document Retrieval Track
نویسندگان
چکیده
The Text REtrieval Conference (TREC) workshops provide a forum for di erent groups to compare retrieval systems on common retrieval tasks. The 1997 TREC workshop will feature a Spoken Document Retrieval task for the rst time. This paper motivates the task and describes the measures to be used to evaluate the e ectiveness of the retrieval methodologies. 1. The Text REtrieval Conference The Text REtrieval Conference (TREC) series is cosponsored by the National Institute of Standards and Technology (NIST) and the Information Technology O ce of the Defense Advanced Research Projects Agency (DARPA) as part of the TIPSTER Text Program. The series, which started in 1992, is designed to promote research in information retrieval by providing appropriate test collections, uniform scoring procedures, and a forum for organizations interested in comparing their results. Thirty-eight groups including representatives from nine di erent countries participated in TREC-5 in November, 1996. TREC has two main tasks, ad hoc and routing retrieval. The ad hoc task investigates the performance of systems that search a static set of documents using novel queries; the routing task investigates the performance of systems that use standing queries to search new streams of documents. In addition, TREC has smaller \tracks" that allow participants to focus on particular subproblems of the retrieval task. Recent track tasks have included Spanish retrieval, Chinese retrieval, the use of natural language processing techniques for retrieval, and retrieval of documents that result from paper documents being scanned by an Optical Character Recognition (OCR) process. The retrieval of OCR documents was the focus of the TREC-5 \Confusion" track. The Confusion Track investigated methods for retrieving document surrogates whose true content has been confused or corrupted in some way. A di erent form of corruption will be used in TREC-6: retrieving spoken documents (i.e., recordings of speech) through surrogates produced by speech recognition systems. This new track, the Spoken Document Retrieval (SDR) track, is intended to foster research on retrieval methodologies for spoken documents. A second goal of the track is to encourage collaboration between the speech and retrieval research communities. This paper de nes the particular task to be addressed in the SDR Track and motivates the track's design. A detailed speci cation of the track, including sign-up procedures, samples of the data formats, and particulars of result submission, can be found at http://www.itl.nist.gov/div894/894.01/sdr97.txt . More information about TREC itself can be found at http://www-nlpir.nist.gov/trec . Questions about the track can be sent to either (or both) of the track organizers at [email protected] or [email protected].
منابع مشابه
TREC-6 1997 Spoken Document Retrieval Track Overview and Results
This paper describes the 1997 TREC-6 Spoken Document Retrieval (SDR) Track which implemented a first evaluation of retrieval of broadcast news excerpts using a combination of automatic speech recognition and information retrieval technologies. The motivations behind the SDR Track and background regarding its development and implementation are discussed. The SDR evaluation collection and topics ...
متن کاملETH TREC-6: Routing, Chinese, Cross-Language and Spoken Document Retrieval
ETH Zurich's participation in TREC-6 consists of experiments in the main routing task, both manual and automatic runs in the Chinese retrieval track, cross-language retrieval in each of German, French and En-glish as part of the new cross-language retrieval track, and experiments in speech recognition and retrieval under the new spoken document retrieval track. This year our routing experiments...
متن کاملSpoken Document Retrieval For TREC-7 At Cambridge University
This paper presents work done at Cambridge University, on the TREC7 Spoken Document Retrieval (SDR) Track. The broadcast news audio was transcribed using a 2-pass gender-dependent HTK speech recogniser which ran at 50 times real time and gave an overall word error rate of 24.8%, the lowest in the track. The Okapi-based retrieval engine used in TREC-6 by the City/Cambridge University collaborati...
متن کاملAT&T at TREC-7 SDR Track
AT&T participated in the Spoken Document Retrieval (SDR) track of TREC-7. Our speech retrieval system uses modern Information Retrieval (IR) methods in conjunction with in-house automatic speech recognition. The novel feature of our TREC-7 work is the use of document expansion to reduce the performance loss due to ASR errors. Results show that retrieval from automatic transcriptions of speech i...
متن کامل1998 TREC-7 Spoken Document Retrieval Track Overview and Results
This paper describes the 1998 TREC-7 Spoken Document Retrieval (SDR) Track which implemented an evaluation of retrieval of broadcast news excerpts using a combination of automatic speech recognition and information retrieval technologies. The motivations behind the SDR Track and background regarding its development and implementation are discussed. The SDR evaluation collection and topics are d...
متن کامل